Simultaneous and exact interval estimates for the contrast of two groups based on an extremely high dimensional variable: application to mass spec data

نویسندگان

  • Yuhyun Park
  • Sean R. Downing
  • Dohyun Kim
  • William C. Hahn
  • Cheng Li
  • Philip W. Kantoff
  • L. J. Wei
چکیده

MOTIVATION Analysis of high-throughput proteomic/genomic data, in particular, surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) data and microarray data, has led to a multitude of techniques aimed at identifying potential biomarkers. Most of the statistical techniques for comparing two groups are based on qualitative measures such as P-value. A quantitative way such as interval estimation for the contrasts of two groups is more appealing. RESULTS We have devised a simultaneous confidence bands method capable of detecting potential biomarkers, while controlling for overall confidence coverage level, in high-dimensional datasets that discriminate two treatment groups using a permutation scheme. For example, for the SELDI-TOF MS data, we deal with the entire spectrum simultaneously and construct (1 - alpha) confidence bands for the mean differences between groups. Furthermore, peaks were identified based on the maximal differences between the groups as determined by the confidence bands. The analysis method herein described gives both qualitative (P-value) and quantitative data (magnitude of difference). The Clinical Proteomics Programs Databank's ovarian cancer dataset and data from in-house samples containing known spiked-in proteins were analyzed. We were able to identify potential biomarkers similar to those described in previous analysis of the ovarian cancer data, however, while these markers are highly significant between cancer and normal groups, our analysis indicated the absolute difference between the two groups was minimal. In addition, we found additional markers than those previously described with greater differences in average intensities. The proposed confidence bands method successfully detected the spiked-in peaks, as well as, secondary peaks generated by adducts and double-charged species. We also illustrate our method utilizing paired gene expression data from a prostate cancer microarray experiment by constructing confidence bands for the fold changes between cancer and normal samples. AVAILABILITY R-package, 'seie.zip' (license: GNU GPL), is publiclly available at http://research2.dfci.harvard.edu/dfci/MS_spike-in_data/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

Quantification of identical and unique segments in ethylene-propylene copolymers using two dimensional liquid chromatography with infra-red detection

Hyphenating High Temperature High Performance Liquid Chromatography (HT-HPLC) with High Temperature Size Exclusion Chromatography (HT-SEC) (High Temperature Two Dimensional Liquid Chromatography (HT-HPLC x HT-SEC or HT 2D-LC)) leads to an isocratic elution in the second dimension, which in turn enables to use IR detector (quantitative detection) for monitoring the eluting polymers. Experimental...

متن کامل

An Algorithm based on Predicting the Interface in Phase Change Materials

Phase change materials are substances that absorb and release thermal energy during the process of melting and freezing. This characteristic makes phase change material (PCM)  a favourite choice to integrate it in buildings. Stephan problem including melting and solidification in PMC materials is an practical problem in many engineering processes. The position of the moving boundary, its veloci...

متن کامل

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values

In this paper, we study the estimation problems for the two-parameter exponentiated Gumbel distribution based on lower record values. An exact confidence interval and an exact joint confidence region for the parameters are constructed. A simulation study is conducted to study the performance of the proposed confidence interval and region. Finally, a numerical example with real data set is gi...

متن کامل

The Effect of Protein Kinase-B on FOXO Autophagy Family Proteins (FOXO1 and FOXO3a) Following High Intensity Interval Training in the Left Ventricle of the Heart of Diabetic Rats by Streptozotocin and Nicotinamide

Background: FOXO family proteins are important factors in autophagy pathway. Protein kinase-B is an important regulator for this family that can be regulated through exercise training. Therefore, the aim of this study is to investigate the effect of protein kinase-B (PKB) on FOXO autophagy family proteins (FOXO1 and FOXO3a) following high intensity interval training (HIIT) in the left ventricle...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 23 12  شماره 

صفحات  -

تاریخ انتشار 2007